Constrained mining of patterns in large databases

نویسنده

  • Sau Dan Lee
چکیده

A theoretical framework is introduced to model data mining problems as the answering of queries in inductive databases. Inductive queries are requests to find out patterns in a database satisfying certain user-specified constraints. Through the analysis of the answer sets to inductive queries composed from anti-monotonic and monotonic basic predicates using Boolean operators, interesting properties, such as “dimension”, are found, which are useful for query optimization. The concept of version spaces has been extended to “generalized version spaces” to encapsulate such answer sets. Generalized version spaces are closed under the usual set operations, thus providing the closure property akin to relation algebra. This generic theoretical framework has been applied to various application domains and various algorithms and optimization techniques have been devised to make use of the theoretical results to efficiently answer queries to inductive databases. Experiments show that these techniques are applicable.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Proposed Data Mining Methodology and its Application to Industrial Procedures

Data mining is the process of discovering correlations, patterns, trends or relationships by searching through a large amount of data stored in repositories, corporate databases, and data warehouses. Industrial procedures with the help of engineers, managers, and other specialists, comprise a broad field and have many tools and techniques in their problem-solving arsenal. The purpose of this st...

متن کامل

A Probabilistic Bayesian Classifier Approach for Breast Cancer Diagnosis and Prognosis

Basically, medical diagnosis problems are the most effective component of treatment policies. Recently, significant advances have been formed in medical diagnosis fields using data mining techniques. Data mining or Knowledge Discovery is searching large databases to discover patterns and evaluate the probability of next occurrences. In this paper, Bayesian Classifier is used as a Non-linear dat...

متن کامل

A Probabilistic Bayesian Classifier Approach for Breast Cancer Diagnosis and Prognosis

Basically, medical diagnosis problems are the most effective component of treatment policies. Recently, significant advances have been formed in medical diagnosis fields using data mining techniques. Data mining or Knowledge Discovery is searching large databases to discover patterns and evaluate the probability of next occurrences. In this paper, Bayesian Classifier is used as a Non-linear dat...

متن کامل

ارائه مدلی برای استخراج اطلاعات از مستندات متنی، مبتنی بر متن‌کاوی در حوزه یادگیری الکترونیکی

As computer networks become the backbones of science and economy, enormous quantities documents become available. So, for extracting useful information from textual data, text mining techniques have been used. Text Mining has become an important research area that discoveries unknown information, facts or new hypotheses by automatically extracting information from different written documents. T...

متن کامل

Mining Multiple Large Data Sources

Effective data analysis using multiple databases requires highly accurate patterns. Local pattern analysis might extract low quality patterns from multiple large databases. Thus, it is necessary to improve mining multiple databases using local pattern analysis. We present existing specialized as well as generalized techniques for mining multiple large databases. We formalize the idea of multi-d...

متن کامل

Database Transposition for Constrained (Closed) Pattern Mining

Recently, different works proposed a new way to mine patterns in databases with pathological size. For example, experiments in genome biology usually provide databases with thousands of attributes (genes) but only tens of objects (experiments). In this case, mining the “transposed” database runs through a smaller search space, and the Galois connection allows to infer the closed patterns of the...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006